Overview

Dataset statistics

Number of variables19
Number of observations45379
Missing cells87543
Missing cells (%)10.2%
Duplicate rows16
Duplicate rows (%)< 0.1%
Total size in memory52.2 MiB
Average record size in memory1.2 KiB

Variable types

Categorical11
Numeric8

Alerts

Dataset has 16 (< 0.1%) duplicate rowsDuplicates
belongs_to_collection has a high cardinality: 1695 distinct valuesHigh cardinality
genres has a high cardinality: 4065 distinct valuesHigh cardinality
original_language has a high cardinality: 89 distinct valuesHigh cardinality
overview has a high cardinality: 44235 distinct valuesHigh cardinality
production_companies has a high cardinality: 22670 distinct valuesHigh cardinality
production_countries has a high cardinality: 2389 distinct valuesHigh cardinality
release_date has a high cardinality: 17334 distinct valuesHigh cardinality
spoken_languages has a high cardinality: 1932 distinct valuesHigh cardinality
tagline has a high cardinality: 20269 distinct valuesHigh cardinality
title has a high cardinality: 42195 distinct valuesHigh cardinality
budget is highly overall correlated with revenue and 1 other fieldsHigh correlation
revenue is highly overall correlated with budget and 1 other fieldsHigh correlation
return is highly overall correlated with budget and 1 other fieldsHigh correlation
original_language is highly imbalanced (67.4%)Imbalance
production_countries is highly imbalanced (57.7%)Imbalance
spoken_languages is highly imbalanced (61.3%)Imbalance
status is highly imbalanced (97.0%)Imbalance
belongs_to_collection has 40890 (90.1%) missing valuesMissing
genres has 2384 (5.3%) missing valuesMissing
overview has 941 (2.1%) missing valuesMissing
production_companies has 11796 (26.0%) missing valuesMissing
production_countries has 6211 (13.7%) missing valuesMissing
tagline has 24981 (55.0%) missing valuesMissing
popularity is highly skewed (γ1 = 29.21581948)Skewed
return is highly skewed (γ1 = 138.3340992)Skewed
overview is uniformly distributedUniform
tagline is uniformly distributedUniform
title is uniformly distributedUniform
budget has 36493 (80.4%) zerosZeros
revenue has 37972 (83.7%) zerosZeros
runtime has 1535 (3.4%) zerosZeros
vote_average has 2947 (6.5%) zerosZeros
return has 39998 (88.1%) zerosZeros

Reproduction

Analysis started2023-05-16 00:24:54.020517
Analysis finished2023-05-16 00:25:16.552225
Duration22.53 seconds
Software versionpandas-profiling v3.6.6
Download configurationconfig.json

Variables

belongs_to_collection
Categorical

HIGH CARDINALITY  MISSING 

Distinct1695
Distinct (%)37.8%
Missing40890
Missing (%)90.1%
Memory size1.6 MiB
The Bowery Boys
 
29
Totò Collection
 
27
James Bond Collection
 
26
Zatôichi: The Blind Swordsman
 
26
The Carry On Collection
 
25
Other values (1690)
4356 

Length

Max length66
Median length48
Mean length24.027623
Min length3

Characters and Unicode

Total characters107860
Distinct characters147
Distinct categories19 ?
Distinct scripts2 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique390 ?
Unique (%)8.7%

Sample

1st rowToy Story Collection
2nd rowGrumpy Old Men Collection
3rd rowFather of the Bride Collection
4th rowJames Bond Collection
5th rowBalto Collection

Common Values

ValueCountFrequency (%)
The Bowery Boys 29
 
0.1%
Totò Collection 27
 
0.1%
James Bond Collection 26
 
0.1%
Zatôichi: The Blind Swordsman 26
 
0.1%
The Carry On Collection 25
 
0.1%
Pokémon Collection 22
 
< 0.1%
Charlie Chan (Sidney Toler) Collection 21
 
< 0.1%
Godzilla (Showa) Collection 16
 
< 0.1%
Uuno Turhapuro 15
 
< 0.1%
Dragon Ball Z (Movie) Collection 15
 
< 0.1%
Other values (1685) 4267
 
9.4%
(Missing) 40890
90.1%

Length

2023-05-15T19:25:16.671217image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
collection 3744
25.3%
the 1146
 
7.8%
of 230
 
1.6%
series 147
 
1.0%
139
 
0.9%
trilogy 87
 
0.6%
and 84
 
0.6%
a 62
 
0.4%
man 62
 
0.4%
in 56
 
0.4%
Other values (2407) 9030
61.1%

Most occurring characters

ValueCountFrequency (%)
o 11117
 
10.3%
e 10452
 
9.7%
10299
 
9.5%
l 10203
 
9.5%
i 7560
 
7.0%
n 7404
 
6.9%
t 6489
 
6.0%
c 4848
 
4.5%
C 4475
 
4.1%
a 4461
 
4.1%
Other values (137) 30552
28.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 80635
74.8%
Uppercase Letter 14513
 
13.5%
Space Separator 10300
 
9.5%
Other Punctuation 650
 
0.6%
Open Punctuation 362
 
0.3%
Close Punctuation 335
 
0.3%
Decimal Number 321
 
0.3%
Dash Punctuation 161
 
0.1%
Other Number 128
 
0.1%
Modifier Symbol 87
 
0.1%
Other values (9) 368
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 11117
13.8%
e 10452
13.0%
l 10203
12.7%
i 7560
9.4%
n 7404
9.2%
t 6489
8.0%
c 4848
6.0%
a 4461
 
5.5%
r 3872
 
4.8%
s 2588
 
3.2%
Other values (27) 11641
14.4%
Uppercase Letter
ValueCountFrequency (%)
C 4475
30.8%
T 1527
 
10.5%
S 1064
 
7.3%
B 682
 
4.7%
M 631
 
4.3%
A 509
 
3.5%
D 505
 
3.5%
H 462
 
3.2%
P 432
 
3.0%
G 417
 
2.9%
Other values (25) 3809
26.2%
Other Punctuation
ValueCountFrequency (%)
. 172
26.5%
' 107
16.5%
: 99
15.2%
, 79
12.2%
& 52
 
8.0%
! 35
 
5.4%
/ 21
 
3.2%
21
 
3.2%
16
 
2.5%
10
 
1.5%
Other values (8) 38
 
5.8%
Decimal Number
ValueCountFrequency (%)
1 80
24.9%
9 64
19.9%
3 54
16.8%
0 51
15.9%
2 21
 
6.5%
8 13
 
4.0%
5 12
 
3.7%
7 11
 
3.4%
6 10
 
3.1%
4 5
 
1.6%
Other Number
ValueCountFrequency (%)
¾ 37
28.9%
² 36
28.1%
³ 18
14.1%
¼ 14
 
10.9%
½ 14
 
10.9%
¹ 9
 
7.0%
Currency Symbol
ValueCountFrequency (%)
¤ 43
53.1%
28
34.6%
¥ 5
 
6.2%
£ 3
 
3.7%
¢ 2
 
2.5%
Modifier Symbol
ValueCountFrequency (%)
¸ 41
47.1%
´ 39
44.8%
¯ 3
 
3.4%
˜ 2
 
2.3%
¨ 2
 
2.3%
Control
ValueCountFrequency (%)
 25
48.1%
 19
36.5%
 5
 
9.6%
 2
 
3.8%
 1
 
1.9%
Open Punctuation
ValueCountFrequency (%)
( 330
91.2%
20
 
5.5%
7
 
1.9%
[ 5
 
1.4%
Other Symbol
ValueCountFrequency (%)
© 45
53.6%
° 23
27.4%
9
 
10.7%
¦ 7
 
8.3%
Final Punctuation
ValueCountFrequency (%)
» 48
87.3%
6
 
10.9%
1
 
1.8%
Initial Punctuation
ValueCountFrequency (%)
11
61.1%
6
33.3%
1
 
5.6%
Space Separator
ValueCountFrequency (%)
10299
> 99.9%
  1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 330
98.5%
] 5
 
1.5%
Dash Punctuation
ValueCountFrequency (%)
- 160
99.4%
1
 
0.6%
Other Letter
ValueCountFrequency (%)
º 33
91.7%
ª 3
 
8.3%
Math Symbol
ValueCountFrequency (%)
± 21
91.3%
¬ 2
 
8.7%
Modifier Letter
ValueCountFrequency (%)
ˆ 10
100.0%
Format
ValueCountFrequency (%)
­ 9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 95157
88.2%
Common 12703
 
11.8%

Most frequent character per script

Common
ValueCountFrequency (%)
10299
81.1%
( 330
 
2.6%
) 330
 
2.6%
. 172
 
1.4%
- 160
 
1.3%
' 107
 
0.8%
: 99
 
0.8%
1 80
 
0.6%
, 79
 
0.6%
9 64
 
0.5%
Other values (64) 983
 
7.7%
Latin
ValueCountFrequency (%)
o 11117
11.7%
e 10452
11.0%
l 10203
10.7%
i 7560
 
7.9%
n 7404
 
7.8%
t 6489
 
6.8%
c 4848
 
5.1%
C 4475
 
4.7%
a 4461
 
4.7%
r 3872
 
4.1%
Other values (63) 24276
25.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 106378
98.6%
None 1348
 
1.2%
Punctuation 85
 
0.1%
Currency Symbols 28
 
< 0.1%
Modifier Letters 12
 
< 0.1%
Letterlike Symbols 9
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 11117
 
10.5%
e 10452
 
9.8%
10299
 
9.7%
l 10203
 
9.6%
i 7560
 
7.1%
n 7404
 
7.0%
t 6489
 
6.1%
c 4848
 
4.6%
C 4475
 
4.2%
a 4461
 
4.2%
Other values (67) 29070
27.3%
None
ValueCountFrequency (%)
Ð 295
21.9%
à 213
15.8%
Ñ 119
 
8.8%
» 48
 
3.6%
© 45
 
3.3%
¤ 43
 
3.2%
¸ 41
 
3.0%
´ 39
 
2.9%
¾ 37
 
2.7%
² 36
 
2.7%
Other values (44) 432
32.0%
Currency Symbols
ValueCountFrequency (%)
28
100.0%
Punctuation
ValueCountFrequency (%)
20
23.5%
16
18.8%
11
12.9%
10
11.8%
7
 
8.2%
6
 
7.1%
6
 
7.1%
3
 
3.5%
3
 
3.5%
1
 
1.2%
Other values (2) 2
 
2.4%
Modifier Letters
ValueCountFrequency (%)
ˆ 10
83.3%
˜ 2
 
16.7%
Letterlike Symbols
ValueCountFrequency (%)
9
100.0%

budget
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct1223
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4232324.6
Minimum0
Maximum3.8 × 108
Zeros36493
Zeros (%)80.4%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-15T19:25:16.797003image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile25000000
Maximum3.8 × 108
Range3.8 × 108
Interquartile range (IQR)0

Descriptive statistics

Standard deviation17439317
Coefficient of variation (CV)4.1205056
Kurtosis66.63901
Mean4232324.6
Median Absolute Deviation (MAD)0
Skewness7.1185794
Sum1.9205866 × 1011
Variance3.0412978 × 1014
MonotonicityNot monotonic
2023-05-15T19:25:16.916213image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 36493
80.4%
5000000 286
 
0.6%
10000000 259
 
0.6%
20000000 243
 
0.5%
2000000 242
 
0.5%
15000000 226
 
0.5%
3000000 223
 
0.5%
25000000 206
 
0.5%
1000000 197
 
0.4%
30000000 190
 
0.4%
Other values (1213) 6814
 
15.0%
ValueCountFrequency (%)
0 36493
80.4%
1 25
 
0.1%
2 14
 
< 0.1%
3 9
 
< 0.1%
4 8
 
< 0.1%
5 8
 
< 0.1%
6 5
 
< 0.1%
7 4
 
< 0.1%
8 5
 
< 0.1%
9 1
 
< 0.1%
ValueCountFrequency (%)
380000000 1
 
< 0.1%
300000000 1
 
< 0.1%
280000000 1
 
< 0.1%
270000000 1
 
< 0.1%
260000000 3
 
< 0.1%
258000000 1
 
< 0.1%
255000000 1
 
< 0.1%
250000000 10
< 0.1%
245000000 2
 
< 0.1%
237000000 1
 
< 0.1%

genres
Categorical

HIGH CARDINALITY  MISSING 

Distinct4065
Distinct (%)9.5%
Missing2384
Missing (%)5.3%
Memory size3.3 MiB
['Drama']
4998 
['Comedy']
3621 
['Documentary']
 
2713
['Drama', 'Romance']
 
1301
['Comedy', 'Drama']
 
1135
Other values (4060)
29227 

Length

Max length98
Median length81
Mean length22.699105
Min length7

Characters and Unicode

Total characters975948
Distinct characters33
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2365 ?
Unique (%)5.5%

Sample

1st row['Animation', 'Comedy', 'Family']
2nd row['Adventure', 'Fantasy', 'Family']
3rd row['Romance', 'Comedy']
4th row['Comedy', 'Drama', 'Romance']
5th row['Comedy']

Common Values

ValueCountFrequency (%)
['Drama'] 4998
 
11.0%
['Comedy'] 3621
 
8.0%
['Documentary'] 2713
 
6.0%
['Drama', 'Romance'] 1301
 
2.9%
['Comedy', 'Drama'] 1135
 
2.5%
['Horror'] 974
 
2.1%
['Comedy', 'Romance'] 930
 
2.0%
['Comedy', 'Drama', 'Romance'] 593
 
1.3%
['Drama', 'Comedy'] 532
 
1.2%
['Horror', 'Thriller'] 528
 
1.2%
Other values (4055) 25670
56.6%
(Missing) 2384
 
5.3%

Length

2023-05-15T19:25:17.061503image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
drama 20256
21.4%
comedy 13181
13.9%
thriller 7620
 
8.0%
romance 6733
 
7.1%
action 6594
 
7.0%
horror 4671
 
4.9%
crime 4305
 
4.5%
documentary 3921
 
4.1%
adventure 3494
 
3.7%
science 3044
 
3.2%
Other values (12) 21037
22.2%

Most occurring characters

ValueCountFrequency (%)
' 182090
18.7%
r 69076
 
7.1%
a 61816
 
6.3%
e 55772
 
5.7%
m 53097
 
5.4%
51861
 
5.3%
o 48533
 
5.0%
, 48050
 
4.9%
[ 42995
 
4.4%
] 42995
 
4.4%
Other values (23) 319663
32.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 512334
52.5%
Other Punctuation 230140
23.6%
Uppercase Letter 95623
 
9.8%
Space Separator 51861
 
5.3%
Open Punctuation 42995
 
4.4%
Close Punctuation 42995
 
4.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 69076
13.5%
a 61816
12.1%
e 55772
10.9%
m 53097
10.4%
o 48533
9.5%
i 39668
7.7%
n 35672
7.0%
y 28508
5.6%
c 27978
5.5%
t 26202
 
5.1%
Other values (7) 66012
12.9%
Uppercase Letter
ValueCountFrequency (%)
D 24177
25.3%
C 17486
18.3%
A 12021
12.6%
F 9746
10.2%
T 8387
 
8.8%
R 6733
 
7.0%
H 6068
 
6.3%
M 4829
 
5.1%
S 3044
 
3.2%
W 2365
 
2.5%
Other Punctuation
ValueCountFrequency (%)
' 182090
79.1%
, 48050
 
20.9%
Space Separator
ValueCountFrequency (%)
51861
100.0%
Open Punctuation
ValueCountFrequency (%)
[ 42995
100.0%
Close Punctuation
ValueCountFrequency (%)
] 42995
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 607957
62.3%
Common 367991
37.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 69076
11.4%
a 61816
 
10.2%
e 55772
 
9.2%
m 53097
 
8.7%
o 48533
 
8.0%
i 39668
 
6.5%
n 35672
 
5.9%
y 28508
 
4.7%
c 27978
 
4.6%
t 26202
 
4.3%
Other values (18) 161635
26.6%
Common
ValueCountFrequency (%)
' 182090
49.5%
51861
 
14.1%
, 48050
 
13.1%
[ 42995
 
11.7%
] 42995
 
11.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 975948
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
' 182090
18.7%
r 69076
 
7.1%
a 61816
 
6.3%
e 55772
 
5.7%
m 53097
 
5.4%
51861
 
5.3%
o 48533
 
5.0%
, 48050
 
4.9%
[ 42995
 
4.4%
] 42995
 
4.4%
Other values (23) 319663
32.8%

id
Real number (ℝ)

Distinct45349
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean108029.98
Minimum2
Maximum469172
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-15T19:25:17.178513image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile5351.3
Q126386.5
median59859
Q3156538
95-th percentile357170.8
Maximum469172
Range469170
Interquartile range (IQR)130151.5

Descriptive statistics

Standard deviation112166.71
Coefficient of variation (CV)1.0382925
Kurtosis0.55941532
Mean108029.98
Median Absolute Deviation (MAD)44419
Skewness1.2830081
Sum4.9022924 × 109
Variance1.2581372 × 1010
MonotonicityNot monotonic
2023-05-15T19:25:17.292794image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
141971 3
 
< 0.1%
23305 2
 
< 0.1%
168538 2
 
< 0.1%
109962 2
 
< 0.1%
119916 2
 
< 0.1%
97995 2
 
< 0.1%
159849 2
 
< 0.1%
84198 2
 
< 0.1%
132641 2
 
< 0.1%
99080 2
 
< 0.1%
Other values (45339) 45358
> 99.9%
ValueCountFrequency (%)
2 1
< 0.1%
3 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
11 1
< 0.1%
12 1
< 0.1%
13 1
< 0.1%
14 1
< 0.1%
15 1
< 0.1%
16 1
< 0.1%
ValueCountFrequency (%)
469172 1
< 0.1%
468707 1
< 0.1%
468343 1
< 0.1%
467731 1
< 0.1%
465044 1
< 0.1%
464819 1
< 0.1%
464207 1
< 0.1%
464111 1
< 0.1%
463906 1
< 0.1%
463800 1
< 0.1%

original_language
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct89
Distinct (%)0.2%
Missing11
Missing (%)< 0.1%
Memory size2.6 MiB
en
32204 
fr
 
2437
it
 
1528
ja
 
1350
de
 
1078
Other values (84)
6771 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters90736
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)< 0.1%

Sample

1st rowen
2nd rowen
3rd rowen
4th rowen
5th rowen

Common Values

ValueCountFrequency (%)
en 32204
71.0%
fr 2437
 
5.4%
it 1528
 
3.4%
ja 1350
 
3.0%
de 1078
 
2.4%
es 992
 
2.2%
ru 822
 
1.8%
hi 508
 
1.1%
ko 444
 
1.0%
zh 408
 
0.9%
Other values (79) 3597
 
7.9%

Length

2023-05-15T19:25:17.388794image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
en 32204
71.0%
fr 2437
 
5.4%
it 1528
 
3.4%
ja 1350
 
3.0%
de 1078
 
2.4%
es 992
 
2.2%
ru 822
 
1.8%
hi 508
 
1.1%
ko 444
 
1.0%
zh 408
 
0.9%
Other values (79) 3597
 
7.9%

Most occurring characters

ValueCountFrequency (%)
e 34529
38.1%
n 32912
36.3%
r 3630
 
4.0%
f 2835
 
3.1%
i 2388
 
2.6%
t 2250
 
2.5%
a 1840
 
2.0%
s 1652
 
1.8%
j 1351
 
1.5%
d 1323
 
1.5%
Other values (16) 6026
 
6.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 90736
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 34529
38.1%
n 32912
36.3%
r 3630
 
4.0%
f 2835
 
3.1%
i 2388
 
2.6%
t 2250
 
2.5%
a 1840
 
2.0%
s 1652
 
1.8%
j 1351
 
1.5%
d 1323
 
1.5%
Other values (16) 6026
 
6.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 90736
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 34529
38.1%
n 32912
36.3%
r 3630
 
4.0%
f 2835
 
3.1%
i 2388
 
2.6%
t 2250
 
2.5%
a 1840
 
2.0%
s 1652
 
1.8%
j 1351
 
1.5%
d 1323
 
1.5%
Other values (16) 6026
 
6.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90736
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 34529
38.1%
n 32912
36.3%
r 3630
 
4.0%
f 2835
 
3.1%
i 2388
 
2.6%
t 2250
 
2.5%
a 1840
 
2.0%
s 1652
 
1.8%
j 1351
 
1.5%
d 1323
 
1.5%
Other values (16) 6026
 
6.6%

overview
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct44235
Distinct (%)99.5%
Missing941
Missing (%)2.1%
Memory size17.8 MiB
No overview found.
 
133
No Overview
 
7
 
5
Adaptation of the Jane Austen novel.
 
3
King Lear, old and tired, divides his kingdom among his daughters, giving great importance to their protestations of love for him. When Cordelia, youngest and most honest, refuses to idly flatter the old man in return for favor, he banishes her and turns for support to his remaining daughters. But Goneril and Regan have no love for him and instead plot to take all his power from him. In a parallel, Lear's loyal courtier Gloucester favors his illegitimate son Edmund after being told lies about his faithful son Edgar. Madness and tragedy befall both ill-starred fathers.
 
3
Other values (44230)
44287 

Length

Max length1411
Median length797
Mean length323.80715
Min length1

Characters and Unicode

Total characters14389342
Distinct characters182
Distinct categories20 ?
Distinct scripts2 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44176 ?
Unique (%)99.4%

Sample

1st rowLed by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences.
2nd rowWhen siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures.
3rd rowA family wedding reignites the ancient feud between next-door neighbors and fishing buddies John and Max. Meanwhile, a sultry Italian divorcée opens a restaurant at the local bait shop, alarming the locals who worry she'll scare the fish away. But she's less interested in seafood than she is in cooking up a hot time with Max.
4th rowCheated on, mistreated and stepped on, the women are holding their breath, waiting for the elusive "good man" to break a string of less-than-stellar lovers. Friends and confidants Vannah, Bernie, Glo and Robin talk it all out, determined to find a better way to breathe.
5th rowJust when George Banks has recovered from his daughter's wedding, he receives the news that she's pregnant ... and that George's wife, Nina, is expecting too. He was planning on selling their home, but that's a plan that -- like George -- will have to change with the arrival of both a grandchild and a kid of his own.

Common Values

ValueCountFrequency (%)
No overview found. 133
 
0.3%
No Overview 7
 
< 0.1%
5
 
< 0.1%
Adaptation of the Jane Austen novel. 3
 
< 0.1%
King Lear, old and tired, divides his kingdom among his daughters, giving great importance to their protestations of love for him. When Cordelia, youngest and most honest, refuses to idly flatter the old man in return for favor, he banishes her and turns for support to his remaining daughters. But Goneril and Regan have no love for him and instead plot to take all his power from him. In a parallel, Lear's loyal courtier Gloucester favors his illegitimate son Edmund after being told lies about his faithful son Edgar. Madness and tragedy befall both ill-starred fathers. 3
 
< 0.1%
No movie overview available. 3
 
< 0.1%
A few funny little novels about different aspects of life. 3
 
< 0.1%
Recovering from a nail gun shot to the head and 13 months of coma, doctor Pekka Valinta starts to unravel the mystery of his past, still suffering from total amnesia. 3
 
< 0.1%
Count de Chagnie has discovered Christine's singing talent on a market place and sent her to his friend Carriere, the director of the Parisian opera. However just when she arrives Carriere's dismissed. His arrogant successor refuses to let a woman of low birth sing in his opera, but graciously employs Christine as gadrobiere for his wife Charlotta, who's installed as first singer. He also fights the phantom, an unknown guy who lives since many years in the catacombs below the opera and was granted privileges by Carriere. However the phantom knows how to defend himself and at the same time helps Christine to her career. 2
 
< 0.1%
Adventurer Allan Quartermain leads an expedition into uncharted African territory in an attempt to locate an explorer who went missing during his search for the fabled diamond mines of King Solomon. 2
 
< 0.1%
Other values (44225) 44274
97.6%
(Missing) 941
 
2.1%

Length

2023-05-15T19:25:17.507382image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 138084
 
5.6%
a 98895
 
4.0%
and 75262
 
3.1%
to 73325
 
3.0%
of 69576
 
2.8%
in 48144
 
2.0%
is 36500
 
1.5%
his 36164
 
1.5%
with 23904
 
1.0%
her 21485
 
0.9%
Other values (97162) 1827470
74.6%

Most occurring characters

ValueCountFrequency (%)
2406536
16.7%
e 1363841
 
9.5%
a 940540
 
6.5%
t 934821
 
6.5%
i 851544
 
5.9%
o 829900
 
5.8%
n 822626
 
5.7%
s 767878
 
5.3%
r 744329
 
5.2%
h 600839
 
4.2%
Other values (172) 4126488
28.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11149302
77.5%
Space Separator 2406657
 
16.7%
Uppercase Letter 401470
 
2.8%
Other Punctuation 311520
 
2.2%
Decimal Number 42257
 
0.3%
Dash Punctuation 35271
 
0.2%
Open Punctuation 10551
 
0.1%
Close Punctuation 10105
 
0.1%
Currency Symbol 8457
 
0.1%
Other Symbol 6345
 
< 0.1%
Other values (10) 7407
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1363841
12.2%
a 940540
 
8.4%
t 934821
 
8.4%
i 851544
 
7.6%
o 829900
 
7.4%
n 822626
 
7.4%
s 767878
 
6.9%
r 744329
 
6.7%
h 600839
 
5.4%
l 478831
 
4.3%
Other values (34) 2814153
25.2%
Uppercase Letter
ValueCountFrequency (%)
A 42756
 
10.6%
T 35968
 
9.0%
S 31129
 
7.8%
M 23955
 
6.0%
B 23704
 
5.9%
C 22805
 
5.7%
H 19429
 
4.8%
W 18653
 
4.6%
I 16800
 
4.2%
D 16312
 
4.1%
Other values (34) 149959
37.4%
Other Punctuation
ValueCountFrequency (%)
, 133459
42.8%
. 124802
40.1%
' 31128
 
10.0%
" 11661
 
3.7%
: 3301
 
1.1%
? 2759
 
0.9%
! 1543
 
0.5%
/ 766
 
0.2%
& 453
 
0.1%
¡ 338
 
0.1%
Other values (13) 1310
 
0.4%
Decimal Number
ValueCountFrequency (%)
1 9751
23.1%
0 8283
19.6%
9 6406
15.2%
2 4254
10.1%
5 2444
 
5.8%
8 2379
 
5.6%
3 2345
 
5.5%
4 2177
 
5.2%
7 2132
 
5.0%
6 2086
 
4.9%
Currency Symbol
ValueCountFrequency (%)
7550
89.3%
¤ 363
 
4.3%
$ 317
 
3.7%
¥ 96
 
1.1%
£ 89
 
1.1%
¢ 42
 
0.5%
Control
ValueCountFrequency (%)
 705
57.2%
 312
25.3%
 101
 
8.2%
 96
 
7.8%
 18
 
1.5%
1
 
0.1%
Other Number
ValueCountFrequency (%)
¾ 512
26.7%
½ 411
21.4%
¼ 381
19.9%
³ 273
14.2%
² 190
 
9.9%
¹ 151
 
7.9%
Modifier Symbol
ValueCountFrequency (%)
¸ 381
32.3%
´ 272
23.1%
¨ 225
19.1%
˜ 205
17.4%
¯ 84
 
7.1%
` 12
 
1.0%
Math Symbol
ValueCountFrequency (%)
± 376
83.7%
¬ 34
 
7.6%
~ 20
 
4.5%
+ 11
 
2.4%
= 6
 
1.3%
| 2
 
0.4%
Open Punctuation
ValueCountFrequency (%)
( 10024
95.0%
382
 
3.6%
88
 
0.8%
[ 54
 
0.5%
{ 3
 
< 0.1%
Other Symbol
ValueCountFrequency (%)
3947
62.2%
© 1555
 
24.5%
° 421
 
6.6%
¦ 328
 
5.2%
® 94
 
1.5%
Initial Punctuation
ValueCountFrequency (%)
891
81.6%
120
 
11.0%
« 58
 
5.3%
23
 
2.1%
Final Punctuation
ValueCountFrequency (%)
649
68.0%
» 197
 
20.6%
94
 
9.8%
15
 
1.6%
Dash Punctuation
ValueCountFrequency (%)
- 35248
99.9%
14
 
< 0.1%
9
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 10048
99.4%
] 54
 
0.5%
} 3
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2406536
> 99.9%
  121
 
< 0.1%
Other Letter
ValueCountFrequency (%)
º 188
81.7%
ª 42
 
18.3%
Format
ValueCountFrequency (%)
­ 278
100.0%
Modifier Letter
ValueCountFrequency (%)
ˆ 54
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 19
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 11550550
80.3%
Common 2838792
 
19.7%

Most frequent character per script

Common
ValueCountFrequency (%)
2406536
84.8%
, 133459
 
4.7%
. 124802
 
4.4%
- 35248
 
1.2%
' 31128
 
1.1%
" 11661
 
0.4%
) 10048
 
0.4%
( 10024
 
0.4%
1 9751
 
0.3%
0 8283
 
0.3%
Other values (83) 57852
 
2.0%
Latin
ValueCountFrequency (%)
e 1363841
11.8%
a 940540
 
8.1%
t 934821
 
8.1%
i 851544
 
7.4%
o 829900
 
7.2%
n 822626
 
7.1%
s 767878
 
6.6%
r 744329
 
6.4%
h 600839
 
5.2%
l 478831
 
4.1%
Other values (79) 3215401
27.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14345875
99.7%
None 29052
 
0.2%
Currency Symbols 7550
 
0.1%
Letterlike Symbols 3947
 
< 0.1%
Punctuation 2659
 
< 0.1%
Modifier Letters 259
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2406536
16.8%
e 1363841
 
9.5%
a 940540
 
6.6%
t 934821
 
6.5%
i 851544
 
5.9%
o 829900
 
5.8%
n 822626
 
5.7%
s 767878
 
5.4%
r 744329
 
5.2%
h 600839
 
4.2%
Other values (81) 4083021
28.5%
Currency Symbols
ValueCountFrequency (%)
7550
100.0%
None
ValueCountFrequency (%)
â 7288
25.1%
à 4175
14.4%
Ð 3236
 
11.1%
© 1555
 
5.4%
Ñ 1351
 
4.7%
 705
 
2.4%
œ 688
 
2.4%
¾ 512
 
1.8%
Ä 470
 
1.6%
µ 452
 
1.6%
Other values (62) 8620
29.7%
Letterlike Symbols
ValueCountFrequency (%)
3947
100.0%
Punctuation
ValueCountFrequency (%)
891
33.5%
649
24.4%
382
14.4%
169
 
6.4%
120
 
4.5%
101
 
3.8%
94
 
3.5%
88
 
3.3%
53
 
2.0%
38
 
1.4%
Other values (5) 74
 
2.8%
Modifier Letters
ValueCountFrequency (%)
˜ 205
79.2%
ˆ 54
 
20.8%

popularity
Real number (ℝ)

Distinct43734
Distinct (%)96.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.9263563
Minimum0
Maximum547.4883
Zeros40
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-15T19:25:17.633673image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.0208069
Q10.388835
median1.130503
Q33.6906865
95-th percentile11.063588
Maximum547.4883
Range547.4883
Interquartile range (IQR)3.3018515

Descriptive statistics

Standard deviation6.009491
Coefficient of variation (CV)2.0535746
Kurtosis1923.7947
Mean2.9263563
Median Absolute Deviation (MAD)0.967653
Skewness29.215819
Sum132795.12
Variance36.113982
MonotonicityNot monotonic
2023-05-15T19:25:17.748255image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 × 10-656
 
0.1%
0.000308 42
 
0.1%
0 40
 
0.1%
0.00022 39
 
0.1%
0.000578 38
 
0.1%
0.001177 38
 
0.1%
0.000844 38
 
0.1%
0.002001 27
 
0.1%
0.003013 21
 
< 0.1%
0.00353 19
 
< 0.1%
Other values (43724) 45021
99.2%
ValueCountFrequency (%)
0 40
0.1%
1 × 10-656
0.1%
2 × 10-66
 
< 0.1%
3 × 10-66
 
< 0.1%
4 × 10-65
 
< 0.1%
5 × 10-61
 
< 0.1%
6 × 10-62
 
< 0.1%
7 × 10-61
 
< 0.1%
8 × 10-66
 
< 0.1%
9 × 10-62
 
< 0.1%
ValueCountFrequency (%)
547.488298 1
< 0.1%
294.337037 1
< 0.1%
287.253654 1
< 0.1%
228.032744 1
< 0.1%
213.849907 1
< 0.1%
187.860492 1
< 0.1%
185.330992 1
< 0.1%
185.070892 1
< 0.1%
183.870374 1
< 0.1%
154.801009 1
< 0.1%

production_companies
Categorical

HIGH CARDINALITY  MISSING 

Distinct22670
Distinct (%)67.5%
Missing11796
Missing (%)26.0%
Memory size3.8 MiB
['Metro-Goldwyn-Mayer (MGM)']
 
742
['Warner Bros.']
 
540
['Paramount Pictures']
 
505
['Twentieth Century Fox Film Corporation']
 
439
['Universal Pictures']
 
320
Other values (22665)
31037 

Length

Max length681
Median length450
Mean length47.919483
Min length6

Characters and Unicode

Total characters1609280
Distinct characters164
Distinct categories18 ?
Distinct scripts2 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20303 ?
Unique (%)60.5%

Sample

1st row['Pixar Animation Studios']
2nd row['TriStar Pictures', 'Teitler Film', 'Interscope Communications']
3rd row['Warner Bros.', 'Lancaster Gate']
4th row['Twentieth Century Fox Film Corporation']
5th row['Sandollar Productions', 'Touchstone Pictures']

Common Values

ValueCountFrequency (%)
['Metro-Goldwyn-Mayer (MGM)'] 742
 
1.6%
['Warner Bros.'] 540
 
1.2%
['Paramount Pictures'] 505
 
1.1%
['Twentieth Century Fox Film Corporation'] 439
 
1.0%
['Universal Pictures'] 320
 
0.7%
['RKO Radio Pictures'] 247
 
0.5%
['Columbia Pictures Corporation'] 207
 
0.5%
['Columbia Pictures'] 146
 
0.3%
['Mosfilm'] 145
 
0.3%
['Walt Disney Pictures'] 85
 
0.2%
Other values (22660) 30207
66.6%
(Missing) 11796
 
26.0%

Length

2023-05-15T19:25:17.886791image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
films 9455
 
5.3%
pictures 9267
 
5.2%
productions 9062
 
5.1%
film 6680
 
3.8%
entertainment 5155
 
2.9%
corporation 2189
 
1.2%
company 1769
 
1.0%
warner 1478
 
0.8%
bros 1411
 
0.8%
the 1382
 
0.8%
Other values (18625) 129856
73.1%

Most occurring characters

ValueCountFrequency (%)
144134
 
9.0%
' 140681
 
8.7%
i 106952
 
6.6%
e 94664
 
5.9%
n 89981
 
5.6%
o 85308
 
5.3%
r 83559
 
5.2%
t 83450
 
5.2%
a 77424
 
4.8%
s 62678
 
3.9%
Other values (154) 640449
39.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 982481
61.1%
Uppercase Letter 204923
 
12.7%
Other Punctuation 187273
 
11.6%
Space Separator 144134
 
9.0%
Open Punctuation 37968
 
2.4%
Close Punctuation 37909
 
2.4%
Decimal Number 4564
 
0.3%
Dash Punctuation 4373
 
0.3%
Other Symbol 3256
 
0.2%
Math Symbol 830
 
0.1%
Other values (8) 1569
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 106952
10.9%
e 94664
9.6%
n 89981
9.2%
o 85308
8.7%
r 83559
8.5%
t 83450
8.5%
a 77424
 
7.9%
s 62678
 
6.4%
l 51272
 
5.2%
m 44281
 
4.5%
Other values (34) 202912
20.7%
Uppercase Letter
ValueCountFrequency (%)
P 27885
13.6%
F 26364
12.9%
C 20588
 
10.0%
M 13363
 
6.5%
S 11915
 
5.8%
E 9747
 
4.8%
A 9549
 
4.7%
T 9360
 
4.6%
B 9002
 
4.4%
G 7813
 
3.8%
Other values (29) 59337
29.0%
Other Punctuation
ValueCountFrequency (%)
' 140681
75.1%
, 37363
 
20.0%
. 5671
 
3.0%
" 987
 
0.5%
& 764
 
0.4%
/ 645
 
0.3%
\ 357
 
0.2%
¡ 330
 
0.2%
134
 
0.1%
§ 129
 
0.1%
Other values (14) 212
 
0.1%
Decimal Number
ValueCountFrequency (%)
2 1035
22.7%
0 748
16.4%
1 743
16.3%
3 556
12.2%
4 481
10.5%
8 232
 
5.1%
9 224
 
4.9%
6 195
 
4.3%
5 178
 
3.9%
7 172
 
3.8%
Other Number
ValueCountFrequency (%)
³ 422
58.4%
¼ 188
26.0%
½ 63
 
8.7%
¾ 29
 
4.0%
² 12
 
1.7%
¹ 8
 
1.1%
Other Symbol
ValueCountFrequency (%)
© 3176
97.5%
° 58
 
1.8%
14
 
0.4%
® 6
 
0.2%
¦ 2
 
0.1%
Modifier Symbol
ValueCountFrequency (%)
´ 155
39.1%
¨ 142
35.9%
¸ 57
 
14.4%
¯ 31
 
7.8%
˜ 11
 
2.8%
Currency Symbol
ValueCountFrequency (%)
¤ 153
63.2%
£ 41
 
16.9%
21
 
8.7%
¢ 15
 
6.2%
¥ 12
 
5.0%
Open Punctuation
ValueCountFrequency (%)
[ 33592
88.5%
( 4318
 
11.4%
34
 
0.1%
24
 
0.1%
Math Symbol
ValueCountFrequency (%)
+ 661
79.6%
± 160
 
19.3%
¬ 8
 
1.0%
| 1
 
0.1%
Final Punctuation
ValueCountFrequency (%)
» 33
46.5%
27
38.0%
6
 
8.5%
5
 
7.0%
Initial Punctuation
ValueCountFrequency (%)
9
42.9%
« 7
33.3%
3
 
14.3%
2
 
9.5%
Dash Punctuation
ValueCountFrequency (%)
- 4329
99.0%
36
 
0.8%
8
 
0.2%
Close Punctuation
ValueCountFrequency (%)
] 33592
88.6%
) 4317
 
11.4%
Other Letter
ValueCountFrequency (%)
º 75
73.5%
ª 27
 
26.5%
Space Separator
ValueCountFrequency (%)
144134
100.0%
Modifier Letter
ValueCountFrequency (%)
ˆ 11
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1187437
73.8%
Common 421843
 
26.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 106952
 
9.0%
e 94664
 
8.0%
n 89981
 
7.6%
o 85308
 
7.2%
r 83559
 
7.0%
t 83450
 
7.0%
a 77424
 
6.5%
s 62678
 
5.3%
l 51272
 
4.3%
m 44281
 
3.7%
Other values (74) 407868
34.3%
Common
ValueCountFrequency (%)
144134
34.2%
' 140681
33.3%
, 37363
 
8.9%
] 33592
 
8.0%
[ 33592
 
8.0%
. 5671
 
1.3%
- 4329
 
1.0%
( 4318
 
1.0%
) 4317
 
1.0%
© 3176
 
0.8%
Other values (70) 10670
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1597028
99.2%
None 11921
 
0.7%
Punctuation 274
 
< 0.1%
Modifier Letters 22
 
< 0.1%
Currency Symbols 21
 
< 0.1%
Letterlike Symbols 14
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
144134
 
9.0%
' 140681
 
8.8%
i 106952
 
6.7%
e 94664
 
5.9%
n 89981
 
5.6%
o 85308
 
5.3%
r 83559
 
5.2%
t 83450
 
5.2%
a 77424
 
4.8%
s 62678
 
3.9%
Other values (75) 628197
39.3%
None
ValueCountFrequency (%)
à 5560
46.6%
© 3176
26.6%
³ 422
 
3.5%
¡ 330
 
2.8%
Ð 266
 
2.2%
¼ 188
 
1.6%
± 160
 
1.3%
´ 155
 
1.3%
¤ 153
 
1.3%
¨ 142
 
1.2%
Other values (50) 1369
 
11.5%
Punctuation
ValueCountFrequency (%)
95
34.7%
36
 
13.1%
34
 
12.4%
27
 
9.9%
24
 
8.8%
11
 
4.0%
9
 
3.3%
8
 
2.9%
6
 
2.2%
6
 
2.2%
Other values (5) 18
 
6.6%
Currency Symbols
ValueCountFrequency (%)
21
100.0%
Letterlike Symbols
ValueCountFrequency (%)
14
100.0%
Modifier Letters
ValueCountFrequency (%)
˜ 11
50.0%
ˆ 11
50.0%

production_countries
Categorical

HIGH CARDINALITY  IMBALANCE  MISSING 

Distinct2389
Distinct (%)6.1%
Missing6211
Missing (%)13.7%
Memory size3.2 MiB
['United States of America']
17846 
['United Kingdom']
2235 
['France']
 
1653
['Japan']
 
1356
['Italy']
 
1029
Other values (2384)
15049 

Length

Max length289
Median length199
Mean length23.571053
Min length8

Characters and Unicode

Total characters923231
Distinct characters56
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1765 ?
Unique (%)4.5%

Sample

1st row['United States of America']
2nd row['United States of America']
3rd row['United States of America']
4th row['United States of America']
5th row['United States of America']

Common Values

ValueCountFrequency (%)
['United States of America'] 17846
39.3%
['United Kingdom'] 2235
 
4.9%
['France'] 1653
 
3.6%
['Japan'] 1356
 
3.0%
['Italy'] 1029
 
2.3%
['Canada'] 841
 
1.9%
['Germany'] 749
 
1.7%
['India'] 735
 
1.6%
['Russia'] 734
 
1.6%
['United Kingdom', 'United States of America'] 569
 
1.3%
Other values (2379) 11421
25.2%
(Missing) 6211
 
13.7%

Length

2023-05-15T19:25:18.276021image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united 25269
21.3%
states 21150
17.8%
of 21149
17.8%
america 21149
17.8%
kingdom 4092
 
3.4%
france 3939
 
3.3%
germany 2260
 
1.9%
italy 2168
 
1.8%
canada 1767
 
1.5%
japan 1649
 
1.4%
Other values (177) 14176
11.9%

Most occurring characters

ValueCountFrequency (%)
' 98825
 
10.7%
e 80657
 
8.7%
79600
 
8.6%
t 72626
 
7.9%
a 70500
 
7.6%
i 58554
 
6.3%
n 47502
 
5.1%
] 39168
 
4.2%
[ 39168
 
4.2%
d 34551
 
3.7%
Other values (46) 302080
32.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 558632
60.5%
Other Punctuation 109082
 
11.8%
Uppercase Letter 97581
 
10.6%
Space Separator 79600
 
8.6%
Close Punctuation 39168
 
4.2%
Open Punctuation 39168
 
4.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 80657
14.4%
t 72626
13.0%
a 70500
12.6%
i 58554
10.5%
n 47502
8.5%
d 34551
6.2%
r 32493
5.8%
o 29584
 
5.3%
m 28708
 
5.1%
c 26373
 
4.7%
Other values (16) 77084
13.8%
Uppercase Letter
ValueCountFrequency (%)
U 25370
26.0%
S 23838
24.4%
A 22391
22.9%
K 5219
 
5.3%
F 4334
 
4.4%
I 3585
 
3.7%
C 2596
 
2.7%
G 2473
 
2.5%
J 1665
 
1.7%
R 1307
 
1.3%
Other values (14) 4803
 
4.9%
Other Punctuation
ValueCountFrequency (%)
' 98825
90.6%
, 10247
 
9.4%
" 10
 
< 0.1%
Space Separator
ValueCountFrequency (%)
79600
100.0%
Close Punctuation
ValueCountFrequency (%)
] 39168
100.0%
Open Punctuation
ValueCountFrequency (%)
[ 39168
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 656213
71.1%
Common 267018
28.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 80657
12.3%
t 72626
11.1%
a 70500
10.7%
i 58554
 
8.9%
n 47502
 
7.2%
d 34551
 
5.3%
r 32493
 
5.0%
o 29584
 
4.5%
m 28708
 
4.4%
c 26373
 
4.0%
Other values (40) 174665
26.6%
Common
ValueCountFrequency (%)
' 98825
37.0%
79600
29.8%
] 39168
 
14.7%
[ 39168
 
14.7%
, 10247
 
3.8%
" 10
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 923231
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
' 98825
 
10.7%
e 80657
 
8.7%
79600
 
8.6%
t 72626
 
7.9%
a 70500
 
7.6%
i 58554
 
6.3%
n 47502
 
5.1%
] 39168
 
4.2%
[ 39168
 
4.2%
d 34551
 
3.7%
Other values (46) 302080
32.7%

release_date
Categorical

Distinct17334
Distinct (%)38.2%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
2008-01-01
 
136
2009-01-01
 
121
2007-01-01
 
118
2005-01-01
 
111
2006-01-01
 
101
Other values (17329)
44792 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters453790
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8571 ?
Unique (%)18.9%

Sample

1st row1995-10-30
2nd row1995-12-15
3rd row1995-12-22
4th row1995-12-22
5th row1995-02-10

Common Values

ValueCountFrequency (%)
2008-01-01 136
 
0.3%
2009-01-01 121
 
0.3%
2007-01-01 118
 
0.3%
2005-01-01 111
 
0.2%
2006-01-01 101
 
0.2%
2002-01-01 96
 
0.2%
2004-01-01 90
 
0.2%
2001-01-01 84
 
0.2%
2003-01-01 76
 
0.2%
1997-01-01 69
 
0.2%
Other values (17324) 44377
97.8%

Length

2023-05-15T19:25:18.387018image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2008-01-01 136
 
0.3%
2009-01-01 121
 
0.3%
2007-01-01 118
 
0.3%
2005-01-01 111
 
0.2%
2006-01-01 101
 
0.2%
2002-01-01 96
 
0.2%
2004-01-01 90
 
0.2%
2001-01-01 84
 
0.2%
2003-01-01 76
 
0.2%
1997-01-01 69
 
0.2%
Other values (17324) 44377
97.8%

Most occurring characters

ValueCountFrequency (%)
0 97607
21.5%
- 90758
20.0%
1 84059
18.5%
2 52808
11.6%
9 39777
8.8%
3 15435
 
3.4%
8 15280
 
3.4%
6 15021
 
3.3%
5 14836
 
3.3%
7 14290
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 363032
80.0%
Dash Punctuation 90758
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 97607
26.9%
1 84059
23.2%
2 52808
14.5%
9 39777
11.0%
3 15435
 
4.3%
8 15280
 
4.2%
6 15021
 
4.1%
5 14836
 
4.1%
7 14290
 
3.9%
4 13919
 
3.8%
Dash Punctuation
ValueCountFrequency (%)
- 90758
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 453790
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 97607
21.5%
- 90758
20.0%
1 84059
18.5%
2 52808
11.6%
9 39777
8.8%
3 15435
 
3.4%
8 15280
 
3.4%
6 15021
 
3.3%
5 14836
 
3.3%
7 14290
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 453790
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 97607
21.5%
- 90758
20.0%
1 84059
18.5%
2 52808
11.6%
9 39777
8.8%
3 15435
 
3.4%
8 15280
 
3.4%
6 15021
 
3.3%
5 14836
 
3.3%
7 14290
 
3.1%

revenue
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct6863
Distinct (%)15.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11229357
Minimum0
Maximum2.7879651 × 109
Zeros37972
Zeros (%)83.7%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-15T19:25:18.490710image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile48018459
Maximum2.7879651 × 109
Range2.7879651 × 109
Interquartile range (IQR)0

Descriptive statistics

Standard deviation64387893
Coefficient of variation (CV)5.7338897
Kurtosis237.09288
Mean11229357
Median Absolute Deviation (MAD)0
Skewness12.255124
Sum5.0957698 × 1011
Variance4.1458008 × 1015
MonotonicityNot monotonic
2023-05-15T19:25:18.611996image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 37972
83.7%
12000000 20
 
< 0.1%
10000000 19
 
< 0.1%
11000000 19
 
< 0.1%
2000000 18
 
< 0.1%
6000000 17
 
< 0.1%
5000000 14
 
< 0.1%
8000000 13
 
< 0.1%
500000 13
 
< 0.1%
1 12
 
< 0.1%
Other values (6853) 7262
 
16.0%
ValueCountFrequency (%)
0 37972
83.7%
1 12
 
< 0.1%
2 3
 
< 0.1%
3 9
 
< 0.1%
4 4
 
< 0.1%
5 5
 
< 0.1%
6 2
 
< 0.1%
7 4
 
< 0.1%
8 5
 
< 0.1%
9 1
 
< 0.1%
ValueCountFrequency (%)
2787965087 1
< 0.1%
2068223624 1
< 0.1%
1845034188 1
< 0.1%
1519557910 1
< 0.1%
1513528810 1
< 0.1%
1506249360 1
< 0.1%
1405403694 1
< 0.1%
1342000000 1
< 0.1%
1274219009 1
< 0.1%
1262886337 1
< 0.1%

runtime
Real number (ℝ)

Distinct353
Distinct (%)0.8%
Missing246
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean94.181043
Minimum0
Maximum1256
Zeros1535
Zeros (%)3.4%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-15T19:25:18.734788image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile12
Q185
median95
Q3107
95-th percentile138
Maximum1256
Range1256
Interquartile range (IQR)22

Descriptive statistics

Standard deviation38.340053
Coefficient of variation (CV)0.40708886
Kurtosis93.929568
Mean94.181043
Median Absolute Deviation (MAD)11
Skewness4.4908332
Sum4250673
Variance1469.9597
MonotonicityNot monotonic
2023-05-15T19:25:18.870783image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
90 2549
 
5.6%
0 1535
 
3.4%
100 1470
 
3.2%
95 1410
 
3.1%
93 1214
 
2.7%
96 1104
 
2.4%
92 1079
 
2.4%
94 1062
 
2.3%
91 1055
 
2.3%
88 1030
 
2.3%
Other values (343) 31625
69.7%
ValueCountFrequency (%)
0 1535
3.4%
1 107
 
0.2%
2 33
 
0.1%
3 48
 
0.1%
4 50
 
0.1%
5 51
 
0.1%
6 72
 
0.2%
7 103
 
0.2%
8 78
 
0.2%
9 63
 
0.1%
ValueCountFrequency (%)
1256 1
< 0.1%
1140 2
< 0.1%
931 1
< 0.1%
925 1
< 0.1%
900 1
< 0.1%
877 1
< 0.1%
874 1
< 0.1%
840 2
< 0.1%
780 1
< 0.1%
720 1
< 0.1%

spoken_languages
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct1932
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size5.4 MiB
[{'iso_639_1': 'en', 'name': 'English'}]
22381 
[]
3768 
[{'iso_639_1': 'fr', 'name': 'Français'}]
 
1852
[{'iso_639_1': 'ja', 'name': '日本語'}]
 
1290
[{'iso_639_1': 'it', 'name': 'Italiano'}]
 
1217
Other values (1927)
14871 

Length

Max length783
Median length40
Mean length48.47987
Min length2

Characters and Unicode

Total characters2199968
Distinct characters140
Distinct categories20 ?
Distinct scripts2 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1367 ?
Unique (%)3.0%

Sample

1st row[{'iso_639_1': 'en', 'name': 'English'}]
2nd row[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'fr', 'name': 'Français'}]
3rd row[{'iso_639_1': 'en', 'name': 'English'}]
4th row[{'iso_639_1': 'en', 'name': 'English'}]
5th row[{'iso_639_1': 'en', 'name': 'English'}]

Common Values

ValueCountFrequency (%)
[{'iso_639_1': 'en', 'name': 'English'}] 22381
49.3%
[] 3768
 
8.3%
[{'iso_639_1': 'fr', 'name': 'Français'}] 1852
 
4.1%
[{'iso_639_1': 'ja', 'name': '日本語'}] 1290
 
2.8%
[{'iso_639_1': 'it', 'name': 'Italiano'}] 1217
 
2.7%
[{'iso_639_1': 'es', 'name': 'Español'}] 901
 
2.0%
[{'iso_639_1': 'ru', 'name': 'Pусский'}] 807
 
1.8%
[{'iso_639_1': 'de', 'name': 'Deutsch'}] 761
 
1.7%
[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'fr', 'name': 'Français'}] 681
 
1.5%
[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'es', 'name': 'Español'}] 572
 
1.3%
Other values (1922) 11149
24.6%

Length

2023-05-15T19:25:19.015811image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
iso_639_1 53277
24.3%
name 53277
24.3%
en 28731
13.1%
english 28730
13.1%
4785
 
2.2%
fr 4194
 
1.9%
franã§ais 4194
 
1.9%
deutsch 2624
 
1.2%
de 2624
 
1.2%
es 2412
 
1.1%
Other values (206) 34228
15.6%

Most occurring characters

ValueCountFrequency (%)
' 426214
19.4%
172909
 
7.9%
n 120548
 
5.5%
_ 106554
 
4.8%
: 106554
 
4.8%
s 99178
 
4.5%
i 94077
 
4.3%
e 92705
 
4.2%
a 75201
 
3.4%
, 64943
 
3.0%
Other values (130) 841085
38.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 780194
35.5%
Other Punctuation 608131
27.6%
Decimal Number 213134
 
9.7%
Space Separator 173697
 
7.9%
Connector Punctuation 106554
 
4.8%
Open Punctuation 99762
 
4.5%
Close Punctuation 98654
 
4.5%
Uppercase Letter 71738
 
3.3%
Currency Symbol 8896
 
0.4%
Math Symbol 6772
 
0.3%
Other values (10) 32436
 
1.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 120548
15.5%
s 99178
12.7%
i 94077
12.1%
e 92705
11.9%
a 75201
9.6%
o 61230
7.8%
m 53989
6.9%
l 36034
 
4.6%
h 33814
 
4.3%
g 30513
 
3.9%
Other values (31) 82905
10.6%
Uppercase Letter
ValueCountFrequency (%)
E 31199
43.5%
à 8203
 
11.4%
Ð 5430
 
7.6%
Ñ 5002
 
7.0%
F 4196
 
5.8%
D 2926
 
4.1%
P 2677
 
3.7%
I 2366
 
3.3%
Ø 2301
 
3.2%
Î 1704
 
2.4%
Other values (25) 5734
 
8.0%
Other Punctuation
ValueCountFrequency (%)
' 426214
70.1%
: 106554
 
17.5%
, 64943
 
10.7%
§ 5520
 
0.9%
¿ 1352
 
0.2%
· 1202
 
0.2%
/ 1015
 
0.2%
¡ 584
 
0.1%
542
 
0.1%
68
 
< 0.1%
Other values (3) 137
 
< 0.1%
Other Number
ValueCountFrequency (%)
¹ 3525
69.3%
½ 550
 
10.8%
² 469
 
9.2%
¼ 249
 
4.9%
³ 173
 
3.4%
¾ 122
 
2.4%
Control
ValueCountFrequency (%)
 3488
55.8%
 1263
 
20.2%
 925
 
14.8%
 575
 
9.2%
 2
 
< 0.1%
Currency Symbol
ValueCountFrequency (%)
¥ 3209
36.1%
¤ 3007
33.8%
1602
18.0%
£ 559
 
6.3%
¢ 519
 
5.8%
Modifier Symbol
ValueCountFrequency (%)
¸ 2735
41.3%
¯ 1429
21.6%
¨ 1335
20.2%
´ 1083
 
16.4%
˜ 33
 
0.5%
Other Symbol
ValueCountFrequency (%)
® 1345
24.8%
° 1092
20.2%
© 1026
19.0%
1009
18.6%
¦ 942
17.4%
Decimal Number
ValueCountFrequency (%)
9 53303
25.0%
1 53277
25.0%
3 53277
25.0%
6 53277
25.0%
Open Punctuation
ValueCountFrequency (%)
{ 53277
53.4%
[ 45379
45.5%
1059
 
1.1%
47
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
± 3660
54.0%
¬ 2037
30.1%
× 1075
 
15.9%
Space Separator
ValueCountFrequency (%)
172909
99.5%
  788
 
0.5%
Close Punctuation
ValueCountFrequency (%)
} 53276
54.0%
] 45378
46.0%
Other Letter
ValueCountFrequency (%)
ª 2591
56.3%
º 2008
43.7%
Dash Punctuation
ValueCountFrequency (%)
2089
79.4%
542
 
20.6%
Final Punctuation
ValueCountFrequency (%)
» 994
97.8%
22
 
2.2%
Connector Punctuation
ValueCountFrequency (%)
_ 106554
100.0%
Format
ValueCountFrequency (%)
­ 542
100.0%
Initial Punctuation
ValueCountFrequency (%)
215
100.0%
Modifier Letter
ValueCountFrequency (%)
ˆ 63
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1344226
61.1%
Latin 855742
38.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 120548
14.1%
s 99178
11.6%
i 94077
11.0%
e 92705
10.8%
a 75201
8.8%
o 61230
 
7.2%
m 53989
 
6.3%
l 36034
 
4.2%
h 33814
 
4.0%
E 31199
 
3.6%
Other values (67) 157767
18.4%
Common
ValueCountFrequency (%)
' 426214
31.7%
172909
12.9%
_ 106554
 
7.9%
: 106554
 
7.9%
, 64943
 
4.8%
9 53303
 
4.0%
{ 53277
 
4.0%
1 53277
 
4.0%
3 53277
 
4.0%
6 53277
 
4.0%
Other values (53) 200641
14.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2085498
94.8%
None 107118
 
4.9%
Punctuation 4645
 
0.2%
Currency Symbols 1602
 
0.1%
Letterlike Symbols 1009
 
< 0.1%
Modifier Letters 96
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
' 426214
20.4%
172909
 
8.3%
n 120548
 
5.8%
_ 106554
 
5.1%
: 106554
 
5.1%
s 99178
 
4.8%
i 94077
 
4.5%
e 92705
 
4.4%
a 75201
 
3.6%
, 64943
 
3.1%
Other values (52) 726615
34.8%
None
ValueCountFrequency (%)
à 8203
 
7.7%
à 6780
 
6.3%
§ 5520
 
5.2%
Ð 5430
 
5.1%
Ñ 5002
 
4.7%
æ 4308
 
4.0%
± 3660
 
3.4%
¹ 3525
 
3.3%
è 3495
 
3.3%
 3488
 
3.3%
Other values (55) 57707
53.9%
Punctuation
ValueCountFrequency (%)
2089
45.0%
1059
22.8%
542
 
11.7%
542
 
11.7%
215
 
4.6%
68
 
1.5%
61
 
1.3%
47
 
1.0%
22
 
0.5%
Currency Symbols
ValueCountFrequency (%)
1602
100.0%
Letterlike Symbols
ValueCountFrequency (%)
1009
100.0%
Modifier Letters
ValueCountFrequency (%)
ˆ 63
65.6%
˜ 33
34.4%

status
Categorical

Distinct6
Distinct (%)< 0.1%
Missing81
Missing (%)0.2%
Memory size2.8 MiB
Released
44938 
Rumored
 
230
Post Production
 
97
In Production
 
19
Planned
 
13

Length

Max length15
Median length8
Mean length8.0117224
Min length7

Characters and Unicode

Total characters362915
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowReleased
2nd rowReleased
3rd rowReleased
4th rowReleased
5th rowReleased

Common Values

ValueCountFrequency (%)
Released 44938
99.0%
Rumored 230
 
0.5%
Post Production 97
 
0.2%
In Production 19
 
< 0.1%
Planned 13
 
< 0.1%
Canceled 1
 
< 0.1%
(Missing) 81
 
0.2%

Length

2023-05-15T19:25:19.142044image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-15T19:25:19.262048image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
released 44938
99.0%
rumored 230
 
0.5%
production 116
 
0.3%
post 97
 
0.2%
in 19
 
< 0.1%
planned 13
 
< 0.1%
canceled 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 135059
37.2%
d 45298
 
12.5%
R 45168
 
12.4%
s 45035
 
12.4%
l 44952
 
12.4%
a 44952
 
12.4%
o 559
 
0.2%
r 346
 
0.1%
u 346
 
0.1%
m 230
 
0.1%
Other values (8) 970
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 317385
87.5%
Uppercase Letter 45414
 
12.5%
Space Separator 116
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 135059
42.6%
d 45298
 
14.3%
s 45035
 
14.2%
l 44952
 
14.2%
a 44952
 
14.2%
o 559
 
0.2%
r 346
 
0.1%
u 346
 
0.1%
m 230
 
0.1%
t 213
 
0.1%
Other values (3) 395
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
R 45168
99.5%
P 226
 
0.5%
I 19
 
< 0.1%
C 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
116
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 362799
> 99.9%
Common 116
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 135059
37.2%
d 45298
 
12.5%
R 45168
 
12.4%
s 45035
 
12.4%
l 44952
 
12.4%
a 44952
 
12.4%
o 559
 
0.2%
r 346
 
0.1%
u 346
 
0.1%
m 230
 
0.1%
Other values (7) 854
 
0.2%
Common
ValueCountFrequency (%)
116
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 362915
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 135059
37.2%
d 45298
 
12.5%
R 45168
 
12.4%
s 45035
 
12.4%
l 44952
 
12.4%
a 44952
 
12.4%
o 559
 
0.2%
r 346
 
0.1%
u 346
 
0.1%
m 230
 
0.1%
Other values (8) 970
 
0.3%

tagline
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct20269
Distinct (%)99.4%
Missing24981
Missing (%)55.0%
Memory size2.8 MiB
Based on a true story.
 
7
Trust no one.
 
4
Be careful what you wish for.
 
4
-
 
4
Who is John Galt?
 
3
Other values (20264)
20376 

Length

Max length302
Median length206
Mean length47.030199
Min length1

Characters and Unicode

Total characters959322
Distinct characters160
Distinct categories20 ?
Distinct scripts2 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20163 ?
Unique (%)98.8%

Sample

1st rowRoll the dice and unleash the excitement!
2nd rowStill Yelling. Still Fighting. Still Ready for Love.
3rd rowFriends are the people who let you be yourself... and never let you forget it.
4th rowJust When His World Is Back To Normal... He's In For The Surprise Of His Life!
5th rowA Los Angeles Crime Saga

Common Values

ValueCountFrequency (%)
Based on a true story. 7
 
< 0.1%
Trust no one. 4
 
< 0.1%
Be careful what you wish for. 4
 
< 0.1%
- 4
 
< 0.1%
Who is John Galt? 3
 
< 0.1%
Drama 3
 
< 0.1%
Classic Albums 3
 
< 0.1%
There are two sides to every love story. 3
 
< 0.1%
There is no turning back 3
 
< 0.1%
Documentary 3
 
< 0.1%
Other values (20259) 20361
44.9%
(Missing) 24981
55.0%

Length

2023-05-15T19:25:19.380304image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 10998
 
6.3%
a 6814
 
3.9%
of 4405
 
2.5%
to 3584
 
2.1%
is 2796
 
1.6%
in 2693
 
1.5%
and 2682
 
1.5%
you 2389
 
1.4%
1580
 
0.9%
for 1522
 
0.9%
Other values (15101) 134454
77.3%

Most occurring characters

ValueCountFrequency (%)
153667
16.0%
e 94403
 
9.8%
t 57260
 
6.0%
o 56559
 
5.9%
a 51468
 
5.4%
n 47495
 
5.0%
i 46033
 
4.8%
r 44988
 
4.7%
s 42354
 
4.4%
h 37165
 
3.9%
Other values (150) 327930
34.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 680649
71.0%
Space Separator 153669
 
16.0%
Uppercase Letter 75118
 
7.8%
Other Punctuation 44438
 
4.6%
Decimal Number 2687
 
0.3%
Dash Punctuation 1928
 
0.2%
Currency Symbol 340
 
< 0.1%
Other Symbol 261
 
< 0.1%
Open Punctuation 61
 
< 0.1%
Close Punctuation 55
 
< 0.1%
Other values (10) 116
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 94403
13.9%
t 57260
 
8.4%
o 56559
 
8.3%
a 51468
 
7.6%
n 47495
 
7.0%
i 46033
 
6.8%
r 44988
 
6.6%
s 42354
 
6.2%
h 37165
 
5.5%
l 30170
 
4.4%
Other values (29) 172754
25.4%
Uppercase Letter
ValueCountFrequency (%)
T 10008
 
13.3%
A 6875
 
9.2%
S 5652
 
7.5%
H 4402
 
5.9%
I 4386
 
5.8%
E 4308
 
5.7%
W 3680
 
4.9%
O 3478
 
4.6%
N 3196
 
4.3%
L 3195
 
4.3%
Other values (26) 25938
34.5%
Other Punctuation
ValueCountFrequency (%)
. 26644
60.0%
! 5785
 
13.0%
' 5672
 
12.8%
, 4224
 
9.5%
? 1161
 
2.6%
" 578
 
1.3%
: 138
 
0.3%
& 83
 
0.2%
* 42
 
0.1%
# 31
 
0.1%
Other values (11) 80
 
0.2%
Decimal Number
ValueCountFrequency (%)
0 802
29.8%
1 516
19.2%
2 299
 
11.1%
9 208
 
7.7%
3 208
 
7.7%
5 168
 
6.3%
4 140
 
5.2%
6 121
 
4.5%
7 121
 
4.5%
8 104
 
3.9%
Math Symbol
ValueCountFrequency (%)
± 8
36.4%
+ 5
22.7%
= 5
22.7%
| 2
 
9.1%
~ 1
 
4.5%
¬ 1
 
4.5%
Other Number
ValueCountFrequency (%)
³ 6
31.6%
½ 5
26.3%
¼ 5
26.3%
² 1
 
5.3%
¹ 1
 
5.3%
¾ 1
 
5.3%
Currency Symbol
ValueCountFrequency (%)
282
82.9%
$ 37
 
10.9%
¤ 18
 
5.3%
¥ 2
 
0.6%
£ 1
 
0.3%
Other Symbol
ValueCountFrequency (%)
¦ 150
57.5%
85
32.6%
© 20
 
7.7%
® 4
 
1.5%
° 2
 
0.8%
Control
ValueCountFrequency (%)
 16
61.5%
 4
 
15.4%
 3
 
11.5%
 2
 
7.7%
 1
 
3.8%
Open Punctuation
ValueCountFrequency (%)
( 49
80.3%
[ 7
 
11.5%
3
 
4.9%
2
 
3.3%
Initial Punctuation
ValueCountFrequency (%)
10
71.4%
« 2
 
14.3%
1
 
7.1%
1
 
7.1%
Modifier Symbol
ValueCountFrequency (%)
˜ 5
41.7%
¯ 3
25.0%
´ 2
 
16.7%
¨ 2
 
16.7%
Dash Punctuation
ValueCountFrequency (%)
- 1924
99.8%
3
 
0.2%
1
 
0.1%
Final Punctuation
ValueCountFrequency (%)
8
80.0%
1
 
10.0%
» 1
 
10.0%
Space Separator
ValueCountFrequency (%)
153667
> 99.9%
  2
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 48
87.3%
] 7
 
12.7%
Other Letter
ValueCountFrequency (%)
ª 1
50.0%
º 1
50.0%
Format
ValueCountFrequency (%)
­ 5
100.0%
Modifier Letter
ValueCountFrequency (%)
ˆ 5
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 755768
78.8%
Common 203554
 
21.2%

Most frequent character per script

Common
ValueCountFrequency (%)
153667
75.5%
. 26644
 
13.1%
! 5785
 
2.8%
' 5672
 
2.8%
, 4224
 
2.1%
- 1924
 
0.9%
? 1161
 
0.6%
0 802
 
0.4%
" 578
 
0.3%
1 516
 
0.3%
Other values (74) 2581
 
1.3%
Latin
ValueCountFrequency (%)
e 94403
 
12.5%
t 57260
 
7.6%
o 56559
 
7.5%
a 51468
 
6.8%
n 47495
 
6.3%
i 46033
 
6.1%
r 44988
 
6.0%
s 42354
 
5.6%
h 37165
 
4.9%
l 30170
 
4.0%
Other values (66) 247873
32.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 958146
99.9%
None 759
 
0.1%
Currency Symbols 282
 
< 0.1%
Letterlike Symbols 85
 
< 0.1%
Punctuation 40
 
< 0.1%
Modifier Letters 10
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
153667
16.0%
e 94403
 
9.9%
t 57260
 
6.0%
o 56559
 
5.9%
a 51468
 
5.4%
n 47495
 
5.0%
i 46033
 
4.8%
r 44988
 
4.7%
s 42354
 
4.4%
h 37165
 
3.9%
Other values (77) 326754
34.1%
Currency Symbols
ValueCountFrequency (%)
282
100.0%
None
ValueCountFrequency (%)
â 281
37.0%
¦ 150
19.8%
à 85
 
11.2%
© 20
 
2.6%
œ 19
 
2.5%
¤ 18
 
2.4%
 16
 
2.1%
 10
 
1.3%
Ä 9
 
1.2%
ã 9
 
1.2%
Other values (46) 142
18.7%
Letterlike Symbols
ValueCountFrequency (%)
85
100.0%
Punctuation
ValueCountFrequency (%)
10
25.0%
8
20.0%
4
 
10.0%
3
 
7.5%
3
 
7.5%
2
 
5.0%
2
 
5.0%
2
 
5.0%
2
 
5.0%
1
 
2.5%
Other values (3) 3
 
7.5%
Modifier Letters
ValueCountFrequency (%)
˜ 5
50.0%
ˆ 5
50.0%

title
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct42195
Distinct (%)93.0%
Missing1
Missing (%)< 0.1%
Memory size3.2 MiB
Cinderella
 
11
Alice in Wonderland
 
9
Hamlet
 
9
Les Misérables
 
8
Beauty and the Beast
 
8
Other values (42190)
45333 

Length

Max length131
Median length82
Mean length16.738904
Min length1

Characters and Unicode

Total characters759578
Distinct characters166
Distinct categories20 ?
Distinct scripts2 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique39865 ?
Unique (%)87.9%

Sample

1st rowToy Story
2nd rowJumanji
3rd rowGrumpier Old Men
4th rowWaiting to Exhale
5th rowFather of the Bride Part II

Common Values

ValueCountFrequency (%)
Cinderella 11
 
< 0.1%
Alice in Wonderland 9
 
< 0.1%
Hamlet 9
 
< 0.1%
Les Misérables 8
 
< 0.1%
Beauty and the Beast 8
 
< 0.1%
Blackout 7
 
< 0.1%
The Three Musketeers 7
 
< 0.1%
A Christmas Carol 7
 
< 0.1%
Treasure Island 7
 
< 0.1%
The Promise 6
 
< 0.1%
Other values (42185) 45299
99.8%

Length

2023-05-15T19:25:19.529520image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 14556
 
10.7%
of 4930
 
3.6%
a 2242
 
1.6%
in 1692
 
1.2%
and 1631
 
1.2%
to 1054
 
0.8%
758
 
0.6%
man 666
 
0.5%
love 663
 
0.5%
for 601
 
0.4%
Other values (24361) 107401
78.9%

Most occurring characters

ValueCountFrequency (%)
90830
 
12.0%
e 76251
 
10.0%
a 48949
 
6.4%
o 45673
 
6.0%
n 40819
 
5.4%
r 40021
 
5.3%
i 39767
 
5.2%
t 36724
 
4.8%
s 29521
 
3.9%
h 28522
 
3.8%
Other values (156) 282501
37.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 533017
70.2%
Uppercase Letter 118697
 
15.6%
Space Separator 90867
 
12.0%
Other Punctuation 10676
 
1.4%
Decimal Number 3856
 
0.5%
Dash Punctuation 975
 
0.1%
Other Symbol 321
 
< 0.1%
Currency Symbol 281
 
< 0.1%
Other Number 224
 
< 0.1%
Modifier Symbol 150
 
< 0.1%
Other values (10) 514
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 16021
 
13.5%
S 10338
 
8.7%
M 8033
 
6.8%
B 7658
 
6.5%
C 7165
 
6.0%
A 6786
 
5.7%
D 6335
 
5.3%
L 5871
 
4.9%
H 5170
 
4.4%
W 5166
 
4.4%
Other values (31) 40154
33.8%
Lowercase Letter
ValueCountFrequency (%)
e 76251
14.3%
a 48949
9.2%
o 45673
 
8.6%
n 40819
 
7.7%
r 40021
 
7.5%
i 39767
 
7.5%
t 36724
 
6.9%
s 29521
 
5.5%
h 28522
 
5.4%
l 25925
 
4.9%
Other values (27) 120845
22.7%
Other Punctuation
ValueCountFrequency (%)
: 3718
34.8%
' 2505
23.5%
. 1607
15.1%
, 1134
 
10.6%
! 647
 
6.1%
& 458
 
4.3%
? 269
 
2.5%
/ 78
 
0.7%
55
 
0.5%
¡ 50
 
0.5%
Other values (13) 155
 
1.5%
Decimal Number
ValueCountFrequency (%)
2 863
22.4%
1 695
18.0%
0 624
16.2%
3 482
12.5%
9 230
 
6.0%
4 229
 
5.9%
5 224
 
5.8%
7 193
 
5.0%
8 160
 
4.1%
6 156
 
4.0%
Currency Symbol
ValueCountFrequency (%)
¤ 131
46.6%
95
33.8%
¥ 18
 
6.4%
$ 18
 
6.4%
£ 10
 
3.6%
¢ 9
 
3.2%
Other Number
ValueCountFrequency (%)
¼ 52
23.2%
³ 46
20.5%
½ 46
20.5%
¾ 34
15.2%
² 26
11.6%
¹ 20
 
8.9%
Other Symbol
ValueCountFrequency (%)
© 221
68.8%
° 42
 
13.1%
42
 
13.1%
¦ 9
 
2.8%
® 7
 
2.2%
Modifier Symbol
ValueCountFrequency (%)
´ 56
37.3%
¨ 54
36.0%
¸ 25
16.7%
¯ 10
 
6.7%
˜ 5
 
3.3%
Control
ValueCountFrequency (%)
 27
64.3%
 8
 
19.0%
 3
 
7.1%
 3
 
7.1%
 1
 
2.4%
Open Punctuation
ValueCountFrequency (%)
( 80
60.2%
34
25.6%
14
 
10.5%
[ 5
 
3.8%
Math Symbol
ValueCountFrequency (%)
± 70
70.7%
+ 16
 
16.2%
¬ 12
 
12.1%
= 1
 
1.0%
Final Punctuation
ValueCountFrequency (%)
» 24
61.5%
8
 
20.5%
4
 
10.3%
3
 
7.7%
Initial Punctuation
ValueCountFrequency (%)
21
70.0%
3
 
10.0%
3
 
10.0%
« 3
 
10.0%
Dash Punctuation
ValueCountFrequency (%)
- 968
99.3%
4
 
0.4%
3
 
0.3%
Space Separator
ValueCountFrequency (%)
90830
> 99.9%
  37
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 82
94.3%
] 5
 
5.7%
Other Letter
ValueCountFrequency (%)
º 28
73.7%
ª 10
 
26.3%
Format
ValueCountFrequency (%)
­ 37
100.0%
Modifier Letter
ValueCountFrequency (%)
ˆ 6
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 651711
85.8%
Common 107867
 
14.2%

Most frequent character per script

Common
ValueCountFrequency (%)
90830
84.2%
: 3718
 
3.4%
' 2505
 
2.3%
. 1607
 
1.5%
, 1134
 
1.1%
- 968
 
0.9%
2 863
 
0.8%
1 695
 
0.6%
! 647
 
0.6%
0 624
 
0.6%
Other values (77) 4276
 
4.0%
Latin
ValueCountFrequency (%)
e 76251
 
11.7%
a 48949
 
7.5%
o 45673
 
7.0%
n 40819
 
6.3%
r 40021
 
6.1%
i 39767
 
6.1%
t 36724
 
5.6%
s 29521
 
4.5%
h 28522
 
4.4%
l 25925
 
4.0%
Other values (69) 239539
36.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 756361
99.6%
None 2923
 
0.4%
Punctuation 146
 
< 0.1%
Currency Symbols 95
 
< 0.1%
Letterlike Symbols 42
 
< 0.1%
Modifier Letters 11
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
90830
 
12.0%
e 76251
 
10.1%
a 48949
 
6.5%
o 45673
 
6.0%
n 40819
 
5.4%
r 40021
 
5.3%
i 39767
 
5.3%
t 36724
 
4.9%
s 29521
 
3.9%
h 28522
 
3.8%
Other values (75) 279284
36.9%
None
ValueCountFrequency (%)
à 817
28.0%
Ð 249
 
8.5%
© 221
 
7.6%
¤ 131
 
4.5%
Î 121
 
4.1%
Ñ 97
 
3.3%
â 72
 
2.5%
± 70
 
2.4%
Ä 61
 
2.1%
´ 56
 
1.9%
Other values (52) 1028
35.2%
Currency Symbols
ValueCountFrequency (%)
95
100.0%
Letterlike Symbols
ValueCountFrequency (%)
42
100.0%
Punctuation
ValueCountFrequency (%)
34
23.3%
21
14.4%
17
11.6%
14
9.6%
14
9.6%
9
 
6.2%
8
 
5.5%
7
 
4.8%
4
 
2.7%
4
 
2.7%
Other values (5) 14
9.6%
Modifier Letters
ValueCountFrequency (%)
ˆ 6
54.5%
˜ 5
45.5%

vote_average
Real number (ℝ)

Distinct92
Distinct (%)0.2%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean5.6240755
Minimum0
Maximum10
Zeros2947
Zeros (%)6.5%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-15T19:25:19.662537image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15
median6
Q36.8
95-th percentile7.8
Maximum10
Range10
Interquartile range (IQR)1.8

Descriptive statistics

Standard deviation1.9154019
Coefficient of variation (CV)0.34057187
Kurtosis2.5420774
Mean5.6240755
Median Absolute Deviation (MAD)0.9
Skewness-1.5244617
Sum255209.3
Variance3.6687645
MonotonicityNot monotonic
2023-05-15T19:25:19.775874image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2947
 
6.5%
6 2463
 
5.4%
5 1998
 
4.4%
7 1884
 
4.2%
6.5 1722
 
3.8%
6.3 1603
 
3.5%
5.5 1381
 
3.0%
5.8 1368
 
3.0%
6.4 1350
 
3.0%
6.7 1342
 
3.0%
Other values (82) 27320
60.2%
ValueCountFrequency (%)
0 2947
6.5%
0.5 13
 
< 0.1%
0.7 1
 
< 0.1%
1 103
 
0.2%
1.1 1
 
< 0.1%
1.2 4
 
< 0.1%
1.3 13
 
< 0.1%
1.4 5
 
< 0.1%
1.5 30
 
0.1%
1.6 6
 
< 0.1%
ValueCountFrequency (%)
10 185
0.4%
9.8 1
 
< 0.1%
9.6 1
 
< 0.1%
9.5 18
 
< 0.1%
9.4 3
 
< 0.1%
9.3 18
 
< 0.1%
9.2 4
 
< 0.1%
9.1 2
 
< 0.1%
9 158
0.3%
8.9 7
 
< 0.1%

release_year
Real number (ℝ)

Distinct135
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1991.8822
Minimum1874
Maximum2020
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-15T19:25:19.890894image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum1874
5-th percentile1941
Q11978
median2001
Q32010
95-th percentile2015
Maximum2020
Range146
Interquartile range (IQR)32

Descriptive statistics

Standard deviation24.054986
Coefficient of variation (CV)0.01207651
Kurtosis0.84032964
Mean1991.8822
Median Absolute Deviation (MAD)12
Skewness-1.2249397
Sum90389624
Variance578.64235
MonotonicityNot monotonic
2023-05-15T19:25:20.037905image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2014 1975
 
4.4%
2015 1905
 
4.2%
2013 1889
 
4.2%
2012 1723
 
3.8%
2011 1667
 
3.7%
2016 1604
 
3.5%
2009 1586
 
3.5%
2010 1501
 
3.3%
2008 1473
 
3.2%
2007 1320
 
2.9%
Other values (125) 28736
63.3%
ValueCountFrequency (%)
1874 1
 
< 0.1%
1878 1
 
< 0.1%
1883 1
 
< 0.1%
1887 1
 
< 0.1%
1888 2
 
< 0.1%
1890 5
 
< 0.1%
1891 6
< 0.1%
1892 3
 
< 0.1%
1893 1
 
< 0.1%
1894 13
< 0.1%
ValueCountFrequency (%)
2020 1
 
< 0.1%
2018 5
 
< 0.1%
2017 532
 
1.2%
2016 1604
3.5%
2015 1905
4.2%
2014 1975
4.4%
2013 1889
4.2%
2012 1723
3.8%
2011 1667
3.7%
2010 1501
3.3%

return
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct5232
Distinct (%)11.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean659.99915
Minimum0
Maximum12396383
Zeros39998
Zeros (%)88.1%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-15T19:25:20.176088image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2.5353413
Maximum12396383
Range12396383
Interquartile range (IQR)0

Descriptive statistics

Standard deviation74690.825
Coefficient of variation (CV)113.16806
Kurtosis20674.324
Mean659.99915
Median Absolute Deviation (MAD)0
Skewness138.3341
Sum29950101
Variance5.5787194 × 109
MonotonicityNot monotonic
2023-05-15T19:25:20.300110image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 39998
88.1%
1 20
 
< 0.1%
2 12
 
< 0.1%
4 11
 
< 0.1%
5 8
 
< 0.1%
3 7
 
< 0.1%
2.5 7
 
< 0.1%
1.333333333 7
 
< 0.1%
1.5 6
 
< 0.1%
7 4
 
< 0.1%
Other values (5222) 5299
 
11.7%
ValueCountFrequency (%)
0 39998
88.1%
5.217391304 × 10-71
 
< 0.1%
7.5 × 10-71
 
< 0.1%
9.375 × 10-71
 
< 0.1%
1.499133126 × 10-61
 
< 0.1%
1.8 × 10-61
 
< 0.1%
1.916666667 × 10-61
 
< 0.1%
3.5 × 10-61
 
< 0.1%
4 × 10-61
 
< 0.1%
5.111111111 × 10-61
 
< 0.1%
ValueCountFrequency (%)
12396383 1
< 0.1%
8500000 1
< 0.1%
4197476.625 1
< 0.1%
2755584 1
< 0.1%
1018619.283 1
< 0.1%
1000000 1
< 0.1%
26881.72043 1
< 0.1%
12890.38667 1
< 0.1%
5330.33945 1
< 0.1%
4133.333333 1
< 0.1%

Interactions

2023-05-15T19:25:14.366127image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:01.889290image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:08.705337image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:09.704252image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:10.702860image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:11.780079image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:12.639379image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:13.460140image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:14.479463image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:02.001028image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:08.830850image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:09.837763image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:10.816872image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:11.892858image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:12.746866image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:13.576138image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:14.581015image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:07.781442image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:08.942847image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:09.944233image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:10.928851image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:11.994876image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:12.844866image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:13.683460image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:14.685031image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:08.088908image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:09.073364image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:10.058455image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:11.268804image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:12.099479image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:12.940883image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:13.795027image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:14.789028image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:08.207646image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:09.206370image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:10.197457image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:11.368068image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:12.203478image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:13.042649image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:13.907268image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:14.941854image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:08.337152image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:09.325364image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:10.350535image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:11.471045image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:12.307931image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:13.143223image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:14.026771image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:15.055977image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:08.462326image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:09.463769image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:10.465542image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:11.573066image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:12.413554image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:13.242554image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:14.134763image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:15.183979image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:08.586328image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:09.585962image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:10.592235image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:11.678054image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:12.527959image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:13.348539image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-15T19:25:14.250761image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Correlations

2023-05-15T19:25:20.408866image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
budgetidpopularityrevenueruntimevote_averagerelease_yearreturnoriginal_languagestatus
budget1.000-0.2550.4630.6440.2270.0720.1410.7750.0000.000
id-0.2551.000-0.410-0.278-0.205-0.1490.392-0.2620.0710.056
popularity0.463-0.4101.0000.4910.3070.2410.1860.4470.0000.000
revenue0.644-0.2780.4911.0000.2540.1270.1030.8530.0000.000
runtime0.227-0.2050.3070.2541.0000.1930.0340.2340.1110.000
vote_average0.072-0.1490.2410.1270.1931.000-0.0090.1200.0700.019
release_year0.1410.3920.1860.1030.034-0.0091.0000.0870.1440.028
return0.775-0.2620.4470.8530.2340.1200.0871.0000.0000.000
original_language0.0000.0710.0000.0000.1110.0700.1440.0001.0000.000
status0.0000.0560.0000.0000.0000.0190.0280.0000.0001.000

Missing values

2023-05-15T19:25:15.490468image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-05-15T19:25:15.889456image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-05-15T19:25:16.336780image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

belongs_to_collectionbudgetgenresidoriginal_languageoverviewpopularityproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagerelease_yearreturn
0Toy Story Collection30000000.0['Animation', 'Comedy', 'Family']862enLed by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences.21.946943['Pixar Animation Studios']['United States of America']1995-10-30373554033.081.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedNaNToy Story7.7199512.451801
1NaN65000000.0['Adventure', 'Fantasy', 'Family']8844enWhen siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures.17.015539['TriStar Pictures', 'Teitler Film', 'Interscope Communications']['United States of America']1995-12-15262797249.0104.0[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'fr', 'name': 'Français'}]ReleasedRoll the dice and unleash the excitement!Jumanji6.919954.043035
2Grumpy Old Men Collection0.0['Romance', 'Comedy']15602enA family wedding reignites the ancient feud between next-door neighbors and fishing buddies John and Max. Meanwhile, a sultry Italian divorcée opens a restaurant at the local bait shop, alarming the locals who worry she'll scare the fish away. But she's less interested in seafood than she is in cooking up a hot time with Max.11.712900['Warner Bros.', 'Lancaster Gate']['United States of America']1995-12-220.0101.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedStill Yelling. Still Fighting. Still Ready for Love.Grumpier Old Men6.519950.000000
3NaN16000000.0['Comedy', 'Drama', 'Romance']31357enCheated on, mistreated and stepped on, the women are holding their breath, waiting for the elusive "good man" to break a string of less-than-stellar lovers. Friends and confidants Vannah, Bernie, Glo and Robin talk it all out, determined to find a better way to breathe.3.859495['Twentieth Century Fox Film Corporation']['United States of America']1995-12-2281452156.0127.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedFriends are the people who let you be yourself... and never let you forget it.Waiting to Exhale6.119955.090760
4Father of the Bride Collection0.0['Comedy']11862enJust when George Banks has recovered from his daughter's wedding, he receives the news that she's pregnant ... and that George's wife, Nina, is expecting too. He was planning on selling their home, but that's a plan that -- like George -- will have to change with the arrival of both a grandchild and a kid of his own.8.387519['Sandollar Productions', 'Touchstone Pictures']['United States of America']1995-02-1076578911.0106.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedJust When His World Is Back To Normal... He's In For The Surprise Of His Life!Father of the Bride Part II5.719950.000000
5NaN60000000.0['Action', 'Crime', 'Drama', 'Thriller']949enObsessive master thief, Neil McCauley leads a top-notch crew on various insane heists throughout Los Angeles while a mentally unstable detective, Vincent Hanna pursues him without rest. Each man recognizes and respects the ability and the dedication of the other even though they are aware their cat-and-mouse game may end in violence.17.924927['Regency Enterprises', 'Forward Pass', 'Warner Bros.']['United States of America']1995-12-15187436818.0170.0[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'es', 'name': 'Español'}]ReleasedA Los Angeles Crime SagaHeat7.719953.123947
6NaN58000000.0['Comedy', 'Romance']11860enAn ugly duckling having undergone a remarkable change, still harbors feelings for her crush: a carefree playboy, but not before his business-focused brother has something to say about it.6.677277['Paramount Pictures', 'Scott Rudin Productions', 'Mirage Enterprises', 'Sandollar Productions', 'Constellation Entertainment', 'Worldwide', 'Mont Blanc Entertainment GmbH']['Germany', 'United States of America']1995-12-150.0127.0[{'iso_639_1': 'fr', 'name': 'Français'}, {'iso_639_1': 'en', 'name': 'English'}]ReleasedYou are cordially invited to the most surprising merger of the year.Sabrina6.219950.000000
7NaN0.0['Action', 'Adventure', 'Drama', 'Family']45325enA mischievous young boy, Tom Sawyer, witnesses a murder by the deadly Injun Joe. Tom becomes friends with Huckleberry Finn, a boy with no future and no family. Tom has to choose between honoring a friendship or honoring an oath because the town alcoholic is accused of the murder. Tom and Huck go through several adventures trying to retrieve evidence.2.561161['Walt Disney Pictures']['United States of America']1995-12-220.097.0[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'de', 'name': 'Deutsch'}]ReleasedThe Original Bad Boys.Tom and Huck5.419950.000000
8NaN35000000.0['Action', 'Adventure', 'Thriller']9091enInternational action superstar Jean Claude Van Damme teams with Powers Boothe in a Tension-packed, suspense thriller, set against the back-drop of a Stanley Cup game.Van Damme portrays a father whose daughter is suddenly taken during a championship hockey game. With the captors demanding a billion dollars by game's end, Van Damme frantically sets a plan in motion to rescue his daughter and abort an impending explosion before the final buzzer...5.231580['Universal Pictures', 'Imperial Entertainment', 'Signature Entertainment']['United States of America']1995-12-2264350171.0106.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedTerror goes into overtime.Sudden Death5.519951.838576
9James Bond Collection58000000.0['Adventure', 'Action', 'Thriller']710enJames Bond must unmask the mysterious head of the Janus Syndicate and prevent the leader from utilizing the GoldenEye weapons system to inflict devastating revenge on Britain.14.686036['United Artists', 'Eon Productions']['United Kingdom', 'United States of America']1995-11-16352194034.0130.0[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'ru', 'name': 'Pусский'}, {'iso_639_1': 'es', 'name': 'Español'}]ReleasedNo limits. No fears. No substitutes.GoldenEye6.619956.072311
belongs_to_collectionbudgetgenresidoriginal_languageoverviewpopularityproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagerelease_yearreturn
45369NaN0.0NaN67179itSentenced to life imprisonment for illegal activities, Italian International member Giulio Manieri holds on to his political ideals while struggling against madness in the loneliness of his prison cell.0.225051NaNNaN1972-01-010.090.0[{'iso_639_1': 'it', 'name': 'Italiano'}]ReleasedNaNSt. Michael Had a Rooster6.019720.0
45370NaN0.0['Horror', 'Mystery', 'Thriller']84419enAn unsuccessful sculptor saves a madman named "The Creeper" from drowning. Seeing an opportunity for revenge, he tricks the psycho into murdering his critics.0.222814['Universal Pictures']['United States of America']1946-03-290.065.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedMeet...The CREEPER!House of Horrors6.319460.0
45371NaN0.0['Mystery', 'Horror']390959enIn this true-crime documentary, we delve into the murder spree that was the inspiration for Joe Berlinger's "Book of Shadows: Blair Witch 2".0.076061NaNNaN2000-10-220.045.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedNaNShadow of the Blair Witch7.020000.0
45372NaN0.0['Horror']289923enA film archivist revisits the story of Rustin Parr, a hermit thought to have murdered seven children while under the possession of the Blair Witch.0.386450['Neptune Salad Entertainment', 'Pirie Productions']['United States of America']2000-10-030.030.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedDo you know what happened 50 years before "The Blair Witch Project"?The Burkittsville 77.020000.0
45373NaN0.0['Science Fiction']222848enIt's the year 3000 AD. The world's most dangerous women are banished to a remote asteroid 45 million light years from earth. Kira Murphy doesn't belong wrongfully accused of a crime she did not commit, she's thrown in this interplanetary prison and left to her own defenses. But Kira's a fighter, and soon she finds herself in the middle of a female gang war where everyone wants a piece of the action... and a piece of her! "Caged Heat 3000" takes the Women-in-Prison genre to a whole new level... and a whole new galaxy!0.661558['Concorde-New Horizons']['United States of America']1995-01-010.085.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedNaNCaged Heat 30003.519950.0
45374NaN0.0['Drama', 'Action', 'Romance']30840enYet another version of the classic epic, with enough variation to make it interesting. The story is the same, but some of the characters are quite different from the usual, in particular Uma Thurman's very special maid Marian. The photography is also great, giving the story a somewhat darker tone.5.683753['Westdeutscher Rundfunk (WDR)', 'Working Title Films', '20th Century Fox Television', 'CanWest Global Communications']['Canada', 'Germany', 'United Kingdom', 'United States of America']1991-05-130.0104.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedNaNRobin Hood5.719910.0
45375NaN0.0['Drama']111109tlAn artist struggles to finish his work while a storyline about a cult plays in his head.0.178241['Sine Olivia']['Philippines']2011-11-170.0360.0[{'iso_639_1': 'tl', 'name': ''}]ReleasedNaNCentury of Birthing9.020110.0
45376NaN0.0['Action', 'Drama', 'Thriller']67758enWhen one of her hits goes wrong, a professional assassin ends up with a suitcase full of a million dollars belonging to a mob boss ...0.903007['American World Pictures']['United States of America']2003-08-010.090.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedA deadly game of wits.Betrayal3.820030.0
45377NaN0.0NaN227506enIn a small town live two brothers, one a minister and the other one a hunchback painter of the chapel who lives with his wife. One dreadful and stormy night, a stranger knocks at the door asking for shelter. The stranger talks about all the good things of the earthly life the minister is missing because of his puritanical faith. The minister comes to accept the stranger's viewpoint but it is others who will pay the consequences because the minister will discover the human pleasures thanks to, ehem, his sister- in -law… The tormented minister and his cuckolded brother will die in a strange accident in the chapel and later an infant will be born from the minister's adulterous relationship.0.003503['Yermoliev']['Russia']1917-10-210.087.0[]ReleasedNaNSatan Triumphant0.019170.0
45378NaN0.0NaN461257en50 years after decriminalisation of homosexuality in the UK, director Daisy Asquith mines the jewels of the BFI archive to take us into the relationships, desires, fears and expressions of gay men and women in the 20th century.0.163015NaN['United Kingdom']2017-06-090.075.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedNaNQueerama0.020170.0

Duplicate rows

Most frequently occurring

belongs_to_collectionbudgetgenresidoriginal_languageoverviewpopularityproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagerelease_yearreturn# duplicates
14NaN0.0['Thriller', 'Mystery']141971fiRecovering from a nail gun shot to the head and 13 months of coma, doctor Pekka Valinta starts to unravel the mystery of his past, still suffering from total amnesia.0.411949['Filmiteollisuus Fine']['Finland']2008-12-260.0108.0[{'iso_639_1': 'fi', 'name': 'suomi'}]ReleasedWhich one is the first to return - memory or the murderer?Blackout6.720080.03
0Why We Fight0.0['Documentary']159849enThe third film of Frank Capra's 'Why We Fight" propaganda film series, dealing with the Nazi conquest of Western Europe in 1940.0.473322NaN['United States of America']1943-01-010.057.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedNaNWhy We Fight: Divide and Conquer5.019430.02
1NaN0.0['Action', 'Drama', 'Romance', 'Adventure']99080enOriginally called White Thunder, American producer Varick Frissell's 1931 film was inspired by his love for the Canadian Arctic Circle. Set in a beautifully black-and-white filmed Newfoundland, it is the story of a rivalry between two seal hunters that plays out on the ice floes during a hunt. Unsatisfied with the first cut, Frissell arranged for the crew to accompany an actual Newfoundland seal hunt on The SS Viking, on which an explosion of dynamite (carried regularly at the time on Arctic ships to combat ice jams) killed many members of the crew, including Frissell. The film was renamed in honor of the dead.0.002362NaNNaN1931-06-210.070.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedActually produced during the Great Newfoundland Seal Hunt and You see the REAL thingThe Viking0.019310.02
2NaN0.0['Action', 'Horror', 'Science Fiction']18440enWhen a comet strikes Earth and kicks up a cloud of toxic dust, hundreds of humans join the ranks of the living dead. But there's bad news for the survivors: The newly minted zombies are hell-bent on eradicating every last person from the planet. For the few human beings who remain, going head to head with the flesh-eating fiends is their only chance for long-term survival. Yet their battle will be dark and cold, with overwhelming odds.1.436085NaN['United States of America']2007-01-010.089.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedNaNDays of Darkness5.020070.02
3NaN0.0['Adventure', 'Animation', 'Drama', 'Action', 'Foreign']23305enIn feudal India, a warrior (Khan) who renounces his role as the longtime enforcer to a local lord becomes the prey in a murderous hunt through the Himalayan mountains.1.967992['Filmfour']['France', 'Germany', 'India', 'United Kingdom']2001-09-230.086.0[{'iso_639_1': 'hi', 'name': 'हिन्दी'}]ReleasedNaNThe Warrior6.320010.02
4NaN0.0['Comedy', 'Drama']11115enAs an ex-gambler teaches a hot-shot college kid some things about playing cards, he finds himself pulled into the world series of poker, where his protégé is his toughest competition.6.880365['Andertainment Group', 'Crescent City Pictures', 'Tag Entertainment']['United States of America']2008-01-290.085.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedNaNDeal5.220080.02
5NaN0.0['Comedy', 'Drama']265189svWhile holidaying in the French Alps, a Swedish family deals with acts of cowardliness as an avalanche breaks out.12.165685['Motlys', 'Coproduction Office', 'Film i Väst']['Norway', 'Sweden', 'France']2014-08-151359497.0118.0[{'iso_639_1': 'fr', 'name': 'Français'}, {'iso_639_1': 'no', 'name': 'Norsk'}, {'iso_639_1': 'sv', 'name': 'svenska'}, {'iso_639_1': 'en', 'name': 'English'}]ReleasedNaNForce Majeure6.820140.02
6NaN0.0['Comedy']97995enAfter breaking a mirror in his home, superstitious Max tries to avoid situations which could bring bad luck but in doing so, causes himself the worst luck imaginable.0.141558['Max Linder Productions']['United States of America']1921-02-060.062.0[{'iso_639_1': 'en', 'name': 'English'}]ReleasedNaNSeven Years Bad Luck5.619210.02
7NaN0.0['Crime', 'Drama', 'Thriller']5511frHitman Jef Costello is a perfectionist who always carefully plans his murders and who never gets caught.9.091288['Fida cinematografica', 'Compagnie Industrielle et Commerciale Cinématographique (CICC)', 'TC Productions', 'Filmel']['France', 'Italy']1967-10-2539481.0105.0[{'iso_639_1': 'fr', 'name': 'Français'}]ReleasedThere is no solitude greater than that of the SamuraiLe Samouraï7.919670.02
8NaN0.0['Drama', 'Comedy']168538enIn Zola's Paris, an ingenue arrives at a tony bordello: she's Nana, guileless, but quickly learning to use her erotic innocence to get what she wants. She's an actress for a soft-core filmmaker and soon is the most popular courtesan in Paris, parlaying this into a house, bought for her by a wealthy banker. She tosses him and takes up with her neighbor, a count of impeccable rectitude, and with the count's impressionable son. The count is soon fetching sticks like a dog and mortgaging his lands to satisfy her whims.1.276602['Cannon Group', 'Metro-Goldwyn-Mayer (MGM)']NaN1983-06-130.092.0[]ReleasedNaNNana, the True Key of Pleasure4.719830.02